Towards Multilingual Information Discovery through a SOM based Text Mining approach

نویسندگان

  • Chung-Hong Lee
  • Hsin-Chang Yang
چکیده

Text mining has been gaining popularity in the knowledge discovery field, particularity with the increasing availability of digital documents in various languages from all around the world. However, currently most text mining tools mainly focus on processing monolingual documents (particularly English documents) only, little attention has been paid to apply the techniques to handle the documents in Asian languages, and further extend the mining algorithms to support the aspects of multilingual information sources. This paper describes our approach for concept discovery from multilingual text collections through a text mining technique. Using a variation of automatic clustering techniques, which applies a neural net approach namely the Self-Organizing Maps (SOM), we have conducted several experiments to uncover associated documents based on Chinese Corpus and Chinese-English bilingual parallel corpora. The initial experiments show some interesting results and a couple of potential ways for future work towards the field of multilingual information discovery.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Mining Based on Self-Organizing Map Method for Arabic-English Documents

Computer information and retrieval is becoming increasingly sophisticated and is being exploited in more and more spheres of human activity. Many computer applications are developed as information distribution systems, of which the Internet is one of the best known and widely used. With enormous quantities of data in different languages available on the net, it is essential that more efficient ...

متن کامل

A method for multilingual text mining and retrieval using growing hierarchical self-organizing maps

With the increasing amount of multilingual texts in the Internet, multilingual text retrieval techniques have become an important research issue. However, the discovery of relationships between different languages remains an open problem. In this paper we propose a method, which applied the growing hierarchical self-organizing map (GHSOM) model, to discover knowledge from multilingual text docu...

متن کامل

A multilingual text mining approach to web cross-lingual text retrieval

To enable concept-based cross-lingual text retrieval (CLTR) using multilingual text mining, our approach will first discover the multilingual concept–term relationships from linguistically diverse textual data relevant to a domain. Second, the multilingual concept–term relationships, in turn, are used to discover the conceptual content of the multilingual text, which is either a document contai...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries

This paper proposes an efficient client-server-based query translation approach to allowing more feasible implementation of cross-language information retrieval (CLIR) services in digital library (DL) systems. A centralized query translation server is constructed to process the translation requests of cross-lingual queries from connected DL systems. To extract translations not covered by standa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000